jump to navigation

WordPress Aggregation – FeedWordPress fix June 16, 2009

Posted by ficial in code fixes, techy.
trackback

The Quick-and-Dirty Version…

After long and painful code-diving I finally found and fixed a problem I was having with MagpieRSS and the FeedWordpress plugin. The short version is that I was having difficulties getting the RSS feeds to update after the initial load, and this arose because the mechanism that got the feed couldn’t properly parse the URL for the feed when that URL had multiple parameters in the query string. The quick solution was to add

 $url = preg_replace('/\&\#038\;/','&',$url);

in fetch_rss (jsut after checking to see that $url is set) in rss.php.

The Long-and-Involved Version…

Williams OIT runs a summer intern program. Each intern gets a WordPress site in which to write about their experiences in the program, as well as anything else they want. Posts the interns make about the program (or anything else they want aggregated) they put in the ‘aggregate’ category. We then run a single, central WP install for the program which uses FeedWordPress the RSS for the aggregate category from each of those sites. We ran into two significant problems with this plan.

The first was easily surmountable with a bit of on-line research. Essentially, the FeedWordPress admin tool didn’t want to accept the RSS links. The reasons why have something to do with the guts of WordPress HTTP handling, and luckily I didn’t have to worry about it because a nice work-around was described by Zemalf at the FeedWordPress site: http://projects.radgeek.com/2009/06/13/feedwordpress-20090613/#comment-24328. In brief, the RSS feeds are added via the WordPress standard Links control. It apparently doesn’t work for everyone, but it did the trick for me.

The next issue was much trickier. The symptom was that feeds once loaded would not update. That is, the initial pass (immediately after adding the feed) would handle any feed entries that were ready to go, but subsequent data from the source sites would not be read. After much cursing and searching I finally tracked the problem down to an interaction between the Links tool and the MagpieRSS module.

There were three major obstacles to tracking down this problem, and they interacted in ways that obscured each other. The first thing I encountered was the fact that the feeds were cached. There are various palces where one can set the value of MAGPIE_CACHE_ON, but it’s not clear what order they’re called and thus which setting will take precedence at any given time. Eventually I tracked things back to the place in the source code where that was being checked, and dropped in a simple hack to get around it. In wp-include/rss.php around line 445 there’s a statement

 if ( !MAGPIE_CACHE_ON )

which I replaced with

 if ( true ) { # CSW : force updates always - cache is causing issues

Now every time I refresh the feed I know it’s going to the source rather than using its cache.

The second problem was the error reporting wasn’t giving any useful information even with the FeedWordPress debugging turned on. The main problem in this case is that Magpie has an internal error function which calls the trigger_error php function, which means the error is always reported as being in the same place – that is, trigger_error only reports where IT is called, not where its enclosing function is called. To find out where the problem really was arising I needed a full stack trace, not just the last point of contact. To get that I added a new function (right after the error function in rss.php):

# CSW - function nabbed from http://us.php.net/manual/en/function.debug-backtrace.php and modified to handle non-string args and to produce HTML output
function getbacktracetext($trace) {
 $output="";
 foreach($trace as $t) {
   $output.="\n<br />File: ".$t['file']." (Line: ".$t['line'].")<br />\n";
   $output.="Function: ".$t['function']."<br />\n";
   $output.="Args: (";
   $acount = 0;
   foreach ($t['args'] as $arg) {
     $output .= ($acount > 0 ? ',' : '');
     $output .= (is_string($arg) ? $arg : serialize($arg));
     $acount++;
   }
   $output.=")<br />\n";
 }
 return $output;
}

then in the original error function I added a statement to get the stack trace:

 $errormsg .= $this->getbacktracetext(debug_backtrace());

However, even this did not quite suffice. I was getting very strange behavior when displaying the error messages. It turns out this arose from the display of the arguments ‘$output .= (is_string($arg) ? $arg : serialize($arg));’. The argument I was trying to display was a string that was the HTML of a web page, starting with the headers. Since that page did things like define style-sheets and javascript I was getting some really strange results (especially when I’d try to update several feeds at once and it would cycle through various style sheets). I finally just commented out that line to get the full stack to display – it was useful to learn that the argument was HTML rather than RSS XML, but beyond that I was more concerned with the execution path.

So, now when I got the error I could see that it arose at line 86 in rss.php. That’s where Magpie tries to parse the XML for the feed. I continued checking further back in the stack and eventually ended up at the fetch_rss function (this is the same area I put in the earlier hack to force no-cache). From checking via the browser I knew the URL I wanted to use was valid, so I put in

 error_log ("fetching RSS url of $url"); # CSW

just to make sure the system was fetching what I thought it was, and lo and behold, it was NOT!

FeedWordPress is driven by the standard Links tool; it just looks at links in a particular category and adds some extra info to the link notes. Since we were getting the RSS for a category our URLs looked like http://foo/?cat=3&feed=rss2. When this URL was saved the & in that URL was converted to ‘&\#038;’, giving a stored value like http://foo/?cat=3&amp;\#038;feed=rss2. Then Magpie tries to fetch that RSS, but it doesn’t un-encode the string, so instead of getting the actual feed it gets that category as a web page, and ignores the value of the meaningless parameter ‘#038;feed’. Unsurprisingly, Magpie could not parse the HTML as an RSS feed, and so it died.

This brings me to the third problem. When I did initial testing to see whether FeedWordPress would work I only tried a top-level feed. That is, my feed URL looked like http://foo/?feed=rss2. Since there was only a singly parameter-value pair in the query string this problem was not exposed and all looked good. When I tried to put it in production pulling data from a category, everything mysteriously broke.

The final solution (NOT fully tested, but works for us) turned out to be quite simple. I just added a line in rss.php in the fetch_rss function (right after the check to see if the $url param is set) to do the appropriate decoding:

 $url = preg_replace('/\&\#038\;/','&',$url); #CSW FIX!!! Process was bombing out on feeds with multiple URL params

and VOILA! It works!

Advertisements

Comments»

1. Ronit - June 16, 2009

For aggregating multiple feeds, my favorite is Drupal’s built in Feed Aggregator module. Example:

http://ephblog.com/drupal/

2. ficial - June 16, 2009

Thanks for the heads up. Looks like a nice tool, but as far as I can tell it operates only in the drupal realm, not in wordpress. Plus, it seems to require a cron job. That’s not a terrible thing, but I prefer the approach of checking only on demand (and using a cache and/or a timeout if you want to designate a minimum time between refreshes).

3. Ronit - June 16, 2009

Yeah, we rely on a Feedburner widget to actually get it on the main page of EphBlog (which runs on WordPress). It does the job, more or less.

By the way, good to see that WIT is still going strong! (I interned and then student-managed the following year). It seems like you guys are standardizing on using WordPress to build most of your sites? This is not a bad idea, WordPress is surprisingly flexible and extensible with plugins, and takes very little time to learn (in my day, we built every site by writing php code by hand, uphill both ways in the snow).

4. Manuel - June 19, 2009

I tried to replicate the step you took but this line looks cut off:

if ( true ) { # CSW : force updates always – cache is causi

Can you just email a copy of the modified rss.php you are using?

5. Carlos - July 28, 2009

Very helpful post.
Thanks.

6. nick - August 5, 2009

If I could reach across this digital ocean and hug you I would. Thanks, this saved me

7. Japanese Lesson - August 13, 2009

I can’t figure this out. Where am I adding this? can you email me a copy of the rss.php file?

ficial - August 17, 2009
8. sdhunt - October 4, 2009

Hi,

Thanks for the fix.
Have you had problems with feedwordpress posting duplicate posts?
It seems to be a problem most feedwordpress users are running into but I can’t find a way to stop it.
I tried the Feedwordpress Duplicate Filter plugin but that hasn’t seemed to make any difference.
Just thought you might have some ideas.
Thanks

ficial - November 3, 2009

I haven’t seen that behavior in our system. We’re dong a feed from a set of pretty controlled and low-volume sites, so that may be why…

9. IronShef - October 23, 2009

I’m running into similar problems. I just upgraded to WP 2.8.5. When I replace the rss.php file with the code you’ve written, I’m getting a syntax error.

Back on 2.7.1 there was feedback that the rss.php file needed to be copied over to the wp-includes directory. Is this still the case? If so, do I need to update both files for this to work?

Any insight you can offer will be appreciated!

ficial - November 3, 2009

Ach – sorry for the late response. I’m afraid I haven’t delved into this issue at all. Most of my wordpress development efforts are for work projects, and all my work projects of late haven’t been in this area. So, sadly, I don’t have any advice on this front. I’ll probably have to deal with this issue next summer, so I’m just waiting and hoping there’s a general fix in place my then :)

10. saad - December 14, 2009

yes i was having this same problem that my feedwordpress wasn’t fetching feeds! so all you have to do is delete feedwordpress dublicate fix! it will start working again! and get newer version of feedwordpress, i searched the fix for 2 hrs and finally found in some1’s tweet tht it is tht simple to fix lol

here’s the link, god bless tht person lol

11. free streaming soccer - December 14, 2009

just getting a backlink for my tip :$ hope you dont mind :)

Saad

12. ficial - December 14, 2009

No problem Saad :)

13. Zoli Erdos - January 5, 2010

Saad, this is so ironic – I’m still searching for the solution and find my own tweet referenced here:-)

Yes, disabling the Dupe filter allows FeedWordpress to resume updates, but the result is a disaster, at least in my setup: posts that I had manually edited, made changes to and published now get owerwritten with the original by FeedWordpress. The Dupe Filter protects against that – until it clogs up.

It’s a Catch 22. I’ve emailed both authors, but no solution yet. :-(

14. alexandrasamuel - January 25, 2010

Hurrah! I was going to head down the rss.php path described up top, ’til I came across the note about the Duplicate Filter plugin. Deactivated it, and I’m now happily aggregating again. Thanks!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: